Database Mining: A Performance Perspective

نویسندگان

  • Rakesh Agrawal
  • Tomasz Imielinski
  • Arun N. Swami
چکیده

We present our perspective of database mining as the con uence of machine learning techniques and the performance emphasis of database technology. We describe three classes of database mining problems involving classi cation, associations, and sequences, and argue that these problems can be uniformly viewed as requiring discovery of rules embedded in massive data. We describe a model and some basic operations for the process of rule discovery. We show how the database mining problems we consider map to this model and how they can be solved by using the basic operations we propose. We give an example of an algorithm for classi cation obtained by combining the basic rule discovery operations. This algorithm not only is e cient in discovering classi cation rules but also has accuracy comparable to ID3, one of the current best classi ers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An overview of interval encoded temporal mining involving prioritized mining, fuzzy mining, and positive and negative rule mining

ISSN:0975-9646 Databases and data warehouses have become a vital part of many organizations. So useful information and helpful knowledge have to be mined from transactions. In real life, media information has time attributes either implicitly or explicitly called as temporal data. This paper focuses on an encoding method for the temporal database that reduces the memory utilization during proce...

متن کامل

Three perspectives of data mining

This paper reviews three recent books on data mining written from three different perspectives, i.e. databases, machine learning, and statistics. Although the exploration in this paper is suggestive instead of conclusive, it reveals that besides some common properties, different perspectives lay strong emphases on different aspects of data mining. The emphasis of the database perspective is on ...

متن کامل

Privacy Protection Using Sensitive Data Protection Algorithm In Frequent Itemset Mining Of Medical Datasets

Frequent Itemset Mining (FIM) is one of the most eminent techniques in the Data mining systems. The exploration of Frequent Itemset Mining distills the recurring knowledge from the incessant data. Explosion of Frequent Itemset Mining in the field of Data Analysis and Data Mining becomes an inescapable one. The paper focuses on “searching the accurate records of efficient database queries withou...

متن کامل

Survey of Clustering Data Mining Techniques

Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns...

متن کامل

Diagnosis of diabetes by using a data mining method based on native data

Background & Aim: Detecting the abnormal performance of diabetes and subsequently getting proper treatment can reduce the mortality associated with the disease. Also, timely diagnosis will result in irreversible complications for the patient. The aim of this study was to determine the status of diabetes mellitus using data mining techniques. Methods: This is an analytical study and its databas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Knowl. Data Eng.

دوره 5  شماره 

صفحات  -

تاریخ انتشار 1993